SQL - grouping some rows as a single row but not others - grouping

I have a table containing the following fields:
Location, WeekEnded and SalesOrder
I would like to sum by Location but there are several locations that only have a small number of orders that I would like to Group Together as a single location, I just cannot seem to get the code correct.
Curent Results
Location A 55
Location B 66
Location C 1
etc 11
What I would like is
Location A 55
Location B 66
Location C-Z 12
Code at the moment is
SELECT
Location,
WeekEnd,
sum(SOCount) as 'Sales Orders'
FROM SalesOrderHist
GROUP BY
Location,
WeekEnd
Anyone that can help I would be forever grateful.

Use CASE in your GROUP BY clause

Final code, in case this may help someone. The code contained quite a bit more so I have narrowed it down to include just the important stuff. I am a bit new to SQL so there may be a more effective way but this worked for me.
SELECT
lname as Location,
WeekStart as 'Week Start',
SOCount
FROM
(
SELECT
WeekStart,
SOCount,
CASE
when cast(lname as varchar(40)) in ('Location A') then 'Location A'
when cast(lname as varchar(40)) in ('Location B') then 'Location B'
when cast(lname as varchar(40)) = 'Location C' then 'Location C'
when cast(lname as varchar(40)) not in ('Location A','Location B','Location C') then 'OTHER'
else cast(lname as varchar(40))
end as lname
FROM
BDO_salesorderhist_pickeddate2
GROUP BY
lname
) d
GROUP BY
lname

Related

Linear problem for chemical composition of a formulation

Dears,
I´m writing a Linear Program for optimization.
One of my goals is to recommend to my supplier which raw material mix to use in its product formulation in order to optimally fulfill my nutrient needs in several locations. Example:
Sets:
SET_RAW_MATERIALS = ['Ureia', 'Ammonium Sulphate']
SET_NUTRIENTS = ['Nitrogen', 'Sulfur'] # Two nutrients that are supplied through the available raw materials
SET_LOCATIONS = ['Location A', 'Location B'] # Two locations for having nutrient demands fulfilled
SET_FORMULATIONS = ['Formulation 1'] # I have only 1 formulation to propose to my supplier
Parameters:
PARAM_NUTRIENT_DEMAND = [
('Location A', 'Nitrogen'): 10,
('Location A', 'Sulfur'): 2,
('Location B', 'Nitrogen'): 10
]
PARAM_RAW_MATERIAL_NUTRIENT_CONTENT = [
('Ammonium Sulphate', 'Nitrogen'): 10,
('Ammonium Sulphate', 'Sulfur'): 5,
('Urea', 'Nitrogen'): 2
]
Variables:
VAR_CHEMICAL_COMPOSITION('Formulation', 'Raw_material') # For each formulation I should find the optimal content of each raw_material (ranging from 0 to 1) VAR_VOLUME_FORMULATION('Formulation', 'Location') # The volume that i should use on each location for the proposed formulation
Constraints:
(1) Nutrient demand (for each location and nutrient):
PARAM_NUTRIENT_DEMAND['Location', 'Nutrient'] <= sum(
VAR_VOLUME_FORMULATION['Formulation', 'Location'] * VAR_CHEMICAL_COMPOSITION['Formulation', 'Raw Material'] * PARAM_RAW_MATERIAL_NUTRIENT_CONTENT['raw_material', 'nutrient'] for all Raw Materials)
This is the only formulation I was able to come up with, but it is obviously non-linear. Is there any way to rewrite this constraint to have a LINEAR PROBLEM?
Thank you for the help.,
p.S.: I know the problem is not complete, I´ve brought only the information necessary to show the problem
you have a blending example with docplex at https://github.com/sumeetparashar/IBM-Decision-Optimization-Introductory-notebook/blob/master/Oil-blend-student-copy.ipynb
I also recommend diet example in
CPLEX_Studio1210\python\examples\mp\modeling

Remove repeated substring in column and only return words in between

I have the following dataframe:
Column1 Column2
0 .com<br><br>Finance<br><br><br><br><br><br><br><br><br><br><br><br> .comFinance
1 .com<br><br>Finance<br><br><br><br><br>DO<br><br><br><br><br><br><br> .comFinanceDO
2 <br><br>Finance<br><br><br>ISV<br><br>DO<br>DO Prem<br><br><br><br><br><br> FinanceISVDODO Prem
3 <br><br>Finance<br><br><br><br><br><br><br><br><br><br><br><br> Finance
4 <br><br>Finance<br><br><br>TTY<br><br><br><br><br><br><br><br><br> ConsultingTTY
I used to following line of code to get Column2:
df['Column2'] = df['Column1'].str.replace('<br>', '', regex=True)
I want to remove all instances of "< b >" and so I want the column to look like this:
Column2
.com, Finance
.com, Finance, DO
Finance, ISV, DO, DO Prem
Finance
Consulting, TTY
Given the following dataframe:
Column1
.com<br><br>Finance<br><br><br><br><br><br><br><br><br><br><br><br>
.com<br><br>Finance<br><br><br><br><br>DO<br><br><br><br><br><br><br>
<br><br>Finance<br><br><br>ISV<br><br>DO<br>DO Prem<br><br><br><br><br><br>
<br><br>Finance<br><br><br><br><br><br><br><br><br><br><br><br>
<br><br>Finance<br><br><br>TTY<br><br><br><br><br><br><br><br><br>
df['Column2'] = df['Column1'].str.replace('<br>', ' ', regex=True).str.strip().replace('\\s+', ', ', regex=True) doesn't work because of sections like <br>DO Prem<br>, which will end of like DO, Prem, not DO Prem.
Split on <br> to make a list, then use a list comprehension to remove the '' spaces.
This will preserve spaces where they're supposed to be.
Join the list values back into a string with (', ').join([...])
import pandas as pd
df['Column2'] = df['Column1'].str.split('<br>').apply(lambda x: (', ').join([y for y in x if y != '']))
# output
Column1 Column2
.com<br><br>Finance<br><br><br><br><br><br><br><br><br><br><br><br> .com, Finance
.com<br><br>Finance<br><br><br><br><br>DO<br><br><br><br><br><br><br> .com, Finance, DO
<br><br>Finance<br><br><br>ISV<br><br>DO<br>DO Prem<br><br><br><br><br><br> Finance, ISV, DO, DO Prem
<br><br>Finance<br><br><br><br><br><br><br><br><br><br><br><br> Finance
<br><br>Finance<br><br><br>TTY<br><br><br><br><br><br><br><br><br> Finance, TTY
### Replace br with space
df['Column 2'] = df['column 1'].str.replace('<br>', ' ')
### Get rid of spaces before and after the string
df['Column 2'] = df['Column 2'].strip()
### Replace the space with ,
df['Column 2'] = df['Column 2'].str.replace('\\s+', ',', regex=True)
As pointed out by TrentonMcKinney, his solution is better. This one doesn't solve the issue when there is a space between the string values in Column 1

Regular expression and csv | Output more readable

I have a text which contains different news articles about terrorist attacks. Each article starts with an html tag (<p>Advertisement) and I would like to extract from each article a specific information: the number of people wounded in the terrorist attacks.
This is a sample of the text file and how the articles are separated:
[<p>Advertisement , By MILAN SCHREUER and ALISSA J. RUBIN OCT. 5, 2016
, BRUSSELS — A man wounded 2 police officers with a knife in Brussels around noon on Wednesday in what the authorities called “a potential terrorist attack.” , The two officers were attacked on the Boulevard Lambermont.....]
[<p>Advertisement ,, By KAREEM FAHIM and MOHAMAD FAHIM ABED JUNE 30, 2016
, At least 33 people were killed and 25 were injured when the Taliban bombed buses carrying police cadets on the outskirts of Kabul, Afghanistan, on Thursday. , KABUL, Afghanistan — Taliban insurgents bombed a convoy of buses carrying police cadets on the outskirts of Kabul, the Afghan capital, on Thursday, killing at least 33 people, including four civilians, according to government officials and the United Nations. , During a year...]
This is my code so far:
text_open = open("News_cleaned_definitive.csv")
text_read = text_open.read()
splitted = text.read.split("<p>")
pattern= ("wounded (\d+)|(\d+) were wounded|(\d+) were injured")
for article in splitted:
result = re.findall(pattern,article)
The output that I get is:
[]
[]
[]
[('', '40', '')]
[('', '150', '')]
[('94', '', '')]
And I would like to make the output more readable and then save it as csv file:
article_1,0
article_2,0
article_3,40
article_3,150
article_3,94
Any suggestion in how to make it more readable?
I rewrote your loop like this and merged with csv write since you requested it:
import csv
with open ("wounded.csv","w",newline="") as f:
writer = csv.writer(f, delimiter=",")
for i,article in enumerate(splitted):
result = re.findall(pattern,article)
nb_casualties = sum(int(x) for x in result[0] if x) if result else 0
row=["article_{}".format(i+1),nb_casualties]
writer.writerow(row)
get index of the article using enumerate
sum the number of victims (in case more than 1 group matches) using a generator comprehension to convert to integer and pass it to sum, that only if something matched (ternary expression checks that)
create the row
print it, or optionally write it as row (one row per iteration) of a csv.writer object.

Check if a string value matches up with the content of an existing table in Oracle 11G

At the moment I am not working as efficient as I could be. For the problem I have I almost know certain that there is a smarter and better way to fix it.
What I am trying to do:
I got a string like this:
'NL 4633 4809 KTU'
The NL is a country code from an existing table and KTU is an university code from an existing table. I need to put this string in my function and check if the string is validated.
In my function (to validate the string) this is what I am working on. I have managed to split up the string with this:
countryCode := checkISIN; -- checkISIN is the full string ('NL 4633 4809 KTU') and I am giving the NL value to this variable. countryCode is the type varchar2(50)
countryCode := regexp_substr(countryCode, '[^ ]+', 1, 1);
Now that I have the country code as shown below:
NL
Has valid country code
I want to validate/check the country code for it's existence from it's own table. I tried this:
if countryCode in ('NL', 'FR', 'DE', 'GB', 'BE', 'US', 'CA')
then dbms_output.put_line('Has valid country code');
else
dbms_output.put_line('Has invald country code. Change the country code to a valid one');
end if;
This works, but it's not dynamically. If someone adds a country code then I have to change the function again.
So is there a (smart/dynamically) way to check the country codes for their existing tables?
I hope my question is not too vague
Cheers
If you have Country codes table and it looks like this:
ID | NAME
----------
1 | NL
2 | FR
3 | BE
when you parse string, you can make like this :
select count(1)
into v_quan
from CountryCodes cc
where nvl(countryCode,'') = cc.name
if v_quan > 0 then
dbms_output.put_line('Has valid country code');
else
dbms_output.put_line('Has invald country code. Change the country code to a valid one');
end if;

What is the best way to populate a load file for a date lookup dimension table?

Informix 11.70.TC4:
I have an SQL dimension table which is used for looking up a date (pk_date) and returning another date (plus1, plus2 or plus3_months) to the client, depending on whether the user selects a "1","2" or a "3".
The table schema is as follows:
TABLE date_lookup
(
pk_date DATE,
plus1_months DATE,
plus2_months DATE,
plus3_months DATE
);
UNIQUE INDEX on date_lookup(pk_date);
I have a load file (pipe delimited) containing dates from 01-28-2012 to 03-31-2014.
The following is an example of the load file:
01-28-2012|02-28-2012|03-28-2012|04-28-2012|
01-29-2012|02-29-2012|03-29-2012|04-29-2012|
01-30-2012|02-29-2012|03-30-2012|04-30-2012|
01-31-2012|02-29-2012|03-31-2012|04-30-2012|
...
03-31-2014|04-30-2014|05-31-2014|06-30-2014|
........................................................................................
EDIT : Sir Jonathan's SQL statement using DATE(pk_date + n UNITS MONTH on 11.70.TC5 worked!
I generated a load file with pk_date's from 01-28-2012 to 12-31-2020, and plus1, plus2 & plus3_months NULL. Loaded this into date_lookup table, then executed the update statement below:
UPDATE date_lookup
SET plus1_months = DATE(pk_date + 1 UNITS MONTH),
plus2_months = DATE(pk_date + 2 UNITS MONTH),
plus3_months = DATE(pk_date + 3 UNITS MONTH);
Apparently, DATE() was able to convert pk_date to DATETIME, do the math with TC5's new algorithm, and return the result in DATE format!
.........................................................................................
The rules for this dimension table are:
If pk_date has 31 days in its month and plus1, plus2 or plus3_months only have 28, 29, or 30 days, then let plus1, plus2 or plus3 equal the last day of that month.
If pk_date has 30 days in its month and plus1, plus2 or plus3 has 28 or 29 days in its month, let them equal the last valid date of those month, and so on.
All other dates fall on the same day of the following month.
My question is: What is the best way to automatically generate pk_dates past 03-31-2014 following the above rules? Can I accomplish this with an SQL script, "sed", C program?
EDIT: I mentioned sed because I already have more than two years worth of data and
could perhaps model the rest after this data, or perhaps a tool like awk is better?
The best technique would be to upgrade to 11.70.TC5 (on 32-bit Windows; generally to 11.70.xC5 or later) and use an expression such as:
SELECT DATE(given_date + n UNITS MONTH)
FROM Wherever
...
The DATETIME code was modified between 11.70.xC4 and 11.70.xC5 to generate dates according to the rules you outline when the dates are as described and you use the + n UNITS MONTH or equivalent notation.
This obviates the need for a table at all. Clearly, though, all your clients would also have to be on 11.70.xC5 too.
Maybe you can update your development machine to 11.70.xC5 and then use this property to generate the data for the table on your development machine, and distribute the data to your clients.
If upgrading at least someone to 11.70.xC5 is not an option, then consider the Perl script suggestion.
Can it be done with SQL? Probably, but it would be excruciating. Ditto for C, and I think 'no' is the answer for sed.
However, a couple of dozen lines of perl seems to produce what you need:
#!/usr/bin/perl
use strict;
use warnings;
use DateTime;
my #dates;
# parse arguments
while (my $datep = shift){
my ($m,$d,$y) = split('-', $datep);
push(#dates, DateTime->new(year => $y, month => $m, day => $d))
|| die "Cannot parse date $!\n";
}
open(STDOUT, ">", "output.unl") || die "Unable to create output file.";
my ($date, $end) = #dates;
while( $date < $end ){
my #row = ($date->mdy('-')); # start with pk_date
for my $mth ( qw[ 1 2 3 ] ){
my $fut_d = $date->clone->add(months => $mth);
until (
($fut_d->month == $date->month + $mth
&& $fut_d->year == $date->year) ||
($fut_d->month == $date->month + $mth - 12
&& $fut_d->year > $date->year)
){
$fut_d->subtract(days => 1); # step back until criteria met
}
push(#row, $fut_d->mdy('-'));
}
print STDOUT join("|", #row, "\n");
$date->add(days => 1);
}
Save that as futuredates.pl, chmod +x it and execute like this:
$ futuredates.pl 04-01-2014 12-31-2020
That seems to do the trick for me.